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Abstract 


We  motivate  the  design  of  a  statically  typed  assembly 
language  (TAL)  and  present  a  type-preserving  transla¬ 
tion  from  System  F  to  TAL.  The  TAL  we  present  is 
based  on  a  conventional  RISC  assembly  language,  but 
its  static  type  system  provides  support  for  enforcing 
high-level  language  abstractions,  such  as  closures,  tu¬ 
ples,  and  objects,  as  well  as  user-defined  abstract  data 
types.  The  type  system  ensures  that  well-typed  pro¬ 
grams  cannot  violate  these  abstractions.  In  addition, 
the  typing  constructs  place  almost  no  restrictions  on 
low-level  optimizations  such  as  register  allocation,  in¬ 
struction  selection,  or  instruction  scheduling. 

Our  translation  to  TAL  is  specified  as  a  sequence  of 
type-preserving  transformations,  including  CPS  and 
closure  conversion  phases;  type-correct  source  programs 
are  mapped  to  type-correct  assembly  language.  A  key 
contribution  is  an  approach  to  polymorphic  closure  con¬ 
version  that  is  considerably  simpler  than  previous  work. 
The  compiler  and  typed  assembly  language  provide  a 
fully  automatic  way  to  produce  proof  carrying  code,  suit¬ 
able  for  use  in  systems  where  untrusted  and  potentially 
malicious  code  must  be  checked  for  safety  before  execu¬ 
tion. 


* T hi f,  material  is  based  on  work  supported  in  part  by  the 
AFOSR  grant  F49620-97-1-0013,  ARPA/RADC  grant  F30602- 
96-1-0317,  ARPA/AF  grant  F30602-95-1-0047,  and  AASERT 
grant  N00014-95-1-0985.  Any  opinions,  findings,  and  conclu¬ 
sions  or  recommendations  expressed  in  this  publication  are  those 
of  the  authors  and  do  not  reflect  the  views  of  these  agencies. 


1  Introduction  and  Motivation 


Compiling  a  source  language  to  a  statically  typed  in¬ 
termediate  language  has  compelling  advantages  over  a 
conventional  untyped  compiler.  An  optimizing  com¬ 
piler  for  a  high-level  language  such  as  ML  may  make 
as  many  as  20  passes  over  a  single  program,  perform¬ 
ing  sophisticated  analyses  and  transformations  such 
as  CPS  conversion  [14,  35,  2,  12,  18],  closure  conver¬ 
sion  [20,  40,  19,  3,  26],  unboxing  [22,  28,  38],  subsump¬ 
tion  elimination  [9,  11],  or  region  inference  [7].  Many 
of  these  optimizations  require  type  information  in  or¬ 
der  to  succeed,  and  even  those  that  do  not  often  ben¬ 
efit  from  the  additional  structure  supplied  by  a  typ¬ 
ing  discipline  [22,  18,  28,  37].  Furthermore,  the  ability 
to  type-check  intermediate  code  provides  an  invaluable 
tool  for  debugging  new  transformations  and  optimiza¬ 
tions  [41,  30]. 

Today  a  small  number  of  compilers  work  with  typed  in¬ 
termediate  languages  in  order  to  realize  some  or  all  of 
these  benefits  [22,  34,  6,  41,  24,  39,  13].  However,  in 
all  of  these  compilers,  there  is  a  conceptual  line  where 
types  are  lost.  For  instance,  the  TIL/ML  compiler  pre¬ 
serves  type  information  through  approximately  80%  of 
compilation,  but  the  remaining  20%  is  untyped. 

We  show  how  to  eliminate  the  untyped  portions  of  a 
compiler  and  by  so  doing,  extend  the  approach  of  com¬ 
piling  with  typed  intermediate  languages  to  typed  tar¬ 
get  languages.  The  target  language  in  this  paper  is 
a  strongly  typed  assembly  language  (TAL)  based  on 
a  generic  RISC  instruction  set.  The  type  system  for 
the  language  is  surprisingly  standard,  supporting  tuples, 
polymorphism,  existentials,  and  a  very  restricted  form 
of  function  pointer,  yet  it  is  sufficiently  powerful  that 
we  can  automatically  generate  well-typed  and  efficient 
code  from  high-level  ML-like  languages.  Furthermore, 
we  claim  that  the  type  system  does  not  seriously  hin¬ 
der  low-level  optimizations  such  as  register  allocation, 
instruction  selection,  instruction  scheduling,  and  copy 
propagation. 
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TAL  not  only  allows  us  to  reap  the  benefits  of  types 
throughout  a  compiler,  but  it  also  enables  a  practical 
system  for  executing  untrusted  code  both  safely  and 
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Figure  1:  Compilation  of  System  F  to  Typed  Assembly  Language 


efficiently.  For  example,  as  suggested  by  the  SPIN 
project  [5],  operating  systems  could  allow  users  to  down¬ 
load  TAL  extensions  into  the  kernel.  The  kernel  could 
type-check  the  TAL  code  to  ensure  that  the  code  never 
accesses  hidden  resources  within  the  kernel,  always  calls 
kernel  routines  with  the  right  number  and  types  of  ar¬ 
guments,  etc.,  and  then  assemble  and  dynamically  link 
the  code  into  the  kernel.1  However,  SPIN  currently  re¬ 
quires  the  user  to  write  the  extension  in  a  single  high- 
level  language  ( Modula-3 )  and  use  a  single  trusted  com¬ 
piler  (along  with  cryptographic  signatures)  in  order  to 
ensure  the  safety  of  the  extension.  In  contrast,  a  ker¬ 
nel  based  on  a  typed  assembly  language  could  support 
extensions  written  in  a  variety  of  high-level  languages 
using  a  variety  of  untrusted  compilers,  as  the  safety  of 
the  resulting  assembly  code  can  be  checked  indepen¬ 
dently  of  the  source  code  or  the  compiler.  Furthermore, 
critical  inner-loops  could  be  hand-written  in  assembly 
language  in  order  to  achieve  optimal  performance.  TAL 
could  also  be  used  to  support  extensible  web-browsers, 
extensible  servers,  active  networks,  or  any  other  “ker¬ 
nel”  where  security,  performance,  and  language  inde¬ 
pendence  are  desired. 

Software  Fault  Isolation  (SFI)  [47]  also  provides  mem¬ 
ory  safety  and  language  independence.  However,  SFI 
requires  the  insertion  of  extra  “sandboxing”  code,  cor¬ 
responding  to  dynamic  type  tests,  to  ensure  that  the 
extension  is  safe.  In  contrast,  TAL  does  not  have  the 
overhead  of  the  additional  sandboxing  code,  as  type¬ 
checking  is  performed  offline. 

With  regard  to  these  security  properties,  TAL  is  an  in¬ 
stance  of  Necula  and  Lee’s  proof  carrying  code  (PCC) 
[33,  32].  Necula  suggests  that  the  relevant  operational 
content  of  simple  type  systems  may  be  encoded  using 
extensions  to  first-order  predicate  logic,  and  proofs  of 
relevant  security  properties  such  as  memory  safety  may 
be  automatically  verified  [32].  In  addition,  Necula’s  ap¬ 
proach  places  no  restrictions  on  code  sequences,  or  in¬ 
struction  scheduling,  whereas  our  TAL  has  a  small  num¬ 
ber  of  such  restrictions  (see  Section  5.2).  However,  in 
general  there  is  no  complete  algorithm  for  construct¬ 
ing  the  proof  that  the  code  satisfies  the  desired  security 
properties.  In  contrast,  we  provide  a  fully  automatic 
procedure  for  generating  typed  assembly  language  from 
a  well-formed  source  term. 

1  O  f  course,  while  type  safety  implies  many  important  security 
properties  such  as  memory  safety,  there  are  a  variety  of  other 
important  security  properties,  such  as  termination,  that  do  not 
follow  from  type  safety. 


1.1  Overview 

In  order  to  motivate  the  typing  constructs  in  TAL  and 
to  justify  our  claims  about  its  expressiveness,  we  spend 
much  of  this  paper  sketching  a  compiler  from  a  vari¬ 
ant  of  the  polymorphic  A-calculus  to  TAL.  The  anxious 
reader  may  wish  to  glance  at  Figure  11  for  a  sample 
TAL  program. 

The  compiler  in  this  paper  is  structured  as  four  trans¬ 
lations  between  the  five  typed  calculi  given  in  Figure  1. 
Each  of  these  calculi  is  used  as  a  first-class  program¬ 
ming  calculus  in  the  sense  that  each  translation  accepts 
any  well-typed  program  of  its  input  calculus;  it  does  not 
assume  that  the  input  is  the  output  from  the  preceding 
translation.  This  allows  the  compiler  to  aggressively  op¬ 
timize  code  between  any  of  the  translation  steps.  The 
inspiration  for  the  phases  and  their  ordering  is  derived 
from  SML/NJ  [4,  2]  (which  is  in  turn  based  on  the  Rab¬ 
bit  [40]  and  Orbit  compilers  [19])  except  that  types  are 
used  throughout  compilation. 

The  rest  of  this  paper  proceeds  as  follows:  Section 
2  presents  AF,  the  compiler’s  source  language,  and 
sketches  a  typed  CPS  translation  based  on  Harper  and 
Lillibridge  [18]  and  Danvy  and  Filinski  [12],  to  our  first 
intermediate  language  AK.  Section  3  presents  the  next 
intermediate  language,  Ac,  and  gives  a  typed  closure 
translation  based  on,  but  considerably  simpler  than,  the 
presentation  of  Minamide,  Morrisett,  and  Harper  [26]. 
Section  4  presents  the  AA  intermediate  language  and 
a  translation  that  makes  allocation  and  initialization 
of  data  structures  explicit.  At  this  point  in  compila¬ 
tion,  the  intermediate  code  is  essentially  in  a  A-calculus 
syntax  for  assembly  language,  following  the  ideas  of 
Wand  [48].  Finally,  Section  5  presents  our  typed  as¬ 
sembly  language  and  defines  a  translation  from  AA  to 
TAL.  Section  6  discusses  extensions  to  TAL  to  support 
language  constructs  not  considered  here. 

Space  considerations  prevent  us  from  giving  all  of  the 
details  of  the  term  translations.  We  encourage  those 
interested  to  read  the  companion  technical  report  [31], 
which  gives  formal  static  semantics  for  each  of  the  in¬ 
termediate  languages.  Also,  the  report  gives  a  full 
proof  that  the  type  system  for  our  assembly  language  is 
sound. 


2  System  F  and  CPS  Conversion 
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The  source  language  for  our  compiler,  AF,  is  a  call-by¬ 
value  variant  of  System  F  [15,  16,  36]  (the  polymorphic 
A-calculus)  augmented  with  products  and  recursion  on 


a 


K\ri  ->  t2J 
A:[[Vo.r] 
.  .  ,r„}] 


def 


=  (ACJn],  (/C[r2])  ->  void)  ->■  void 
d=f  V[a].((/C[r])  —*  void)  — >  void 
=  </CIn],...,/C[rn]) 


Figure  2:  CPS  type  translation 


terms.  The  syntax  for  AF  appears  below: 


types 

T 

::=  a  \  int  \  n  — *■  T2  \  Va.r 

|  (ri  ,  •  •  •  ,  Tn 

terms 

e 

::=  x  |  i  |  fix  x(x1:T1):T2.e  \ 

ej_e2  ]  A  a.e 

e[r]  |  (ei,  •  •  ■ ,  e«)  j 

7T,:(e)  | 

ei  p  e2  |  ifO{e i,  e2, 

e3) 

prims 

P 

::=  +  |  -  |  x 

In  order  to  simplify  the  presentation,  we  assume  AF 
has  only  integers  as  a  base  type  (ranged  over  by  the 
metavariable  i).  We  use  X  to  denote  a  vector  of  syn¬ 
tactic  objects  drawn  from  X.  For  instance,  (r)  is 
shorthand  for  a  product  type  (r-i ....  ,  rn).  The  term 
fix  x(x\\Ti  ):r2.e  represents  a  recursively-defined  func¬ 
tion  x  with  argument  x\  of  type  n  and  body  e.  Hence, 
both  x  and  Xi  are  bound  within  e.  Similarly,  a  is  bound 
in  e  for  A a.e  and  bound  in  t  for  Vo.r.  As  usual,  we 
consider  syntactic  objects  to  be  equivalent  up  to  alpha- 
conversion  of  bound  variables. 

We  interpret  AF  with  a  conventional  call-by- value  oper¬ 
ational  semantics  (not  presented  here).  The  static  se¬ 
mantics  is  specified  as  a  set  of  inference  rules  that  allow 
us  to  conclude  judgments  of  the  form  A;  T  hp  e  :  t 
where  A  is  a  set  containing  the  free  type  variables  of  T, 
e,  and  r;  T  assigns  types  to  the  free  variables  of  e;  and 
t  is  the  type  of  e. 

As  a  running  example,  we  will  be  considering  compila¬ 
tion  and  evaluation  of  6  factorial: 

( fix  f(n:int):int.  ifO(n,  I ,  n  x  f(n  —  1)))  6. 

The  first  compilation  stage  is  conversion  to  continuation¬ 
passing  style  (CPS).  This  stage  names  all  intermedi¬ 
ate  computations  and  eliminates  the  need  for  a  con¬ 
trol  stack.  All  unconditional  control  transfers,  including 
function  invocation  and  return,  are  achieved  via  func¬ 
tion  call.  The  target  calculus  for  this  phase  is  AK: 

types  r  ::=  a  \  int  |  V[<5].(r)  — >  void  |  (r) 

terms  e  ::=  v[t](v)  \  ifO(v,e i,e2)  |  halt[r]v  \ 

let  x  =  v  in  e  \ 
let  x  =  7r i(i>)  in  e  \ 
let  x  =  «i  p  v2  in  e 

values  v  ::=  x  \  i  \  (v)  \  fix  x\S\(x\:ti,  .  .  . ,  Xk'.Tk).e 

Code  in  AK  is  nearly  linear:  it  consists  of  a  series  of  let 
bindings  followed  by  a  function  call.  The  exception  to 
this  is  the  ifO  construct,  which  is  still  a  tree  containing 
two  expressions. 


There  is  only  one  abstraction  mechanism  {fix),  which 
abstracts  both  type  and  value  variables,  simplifying  the 
rest  of  the  compiler.  The  corresponding  V  and  — *■  types 
are  also  combined.  However,  we  abbreviate  V[].(r)  — *■ 
void  as  ( f  )  — >  void . 

Unlike  AF,  functions  do  not  return  a  value;  instead,  they 
invoke  continuations.  The  function  notation  “ — -  void” 
is  intended  to  suggest  this.  Execution  is  completed  by 
the  construct  halt[r]v,  which  accepts  a  result  value  v  of 
type  t  and  terminates  the  computation.  Typically,  this 
construct  is  used  by  the  top-level  continuation.  Aside 
from  these  differences,  the  static  and  dynamic  semantics 
for  AK  is  completely  standard. 

The  implementation  of  typed  CPS-conversion  is  based 
upon  that  of  Harper  and  Lillibridge  [18].  The  type 
translation  /C[-  J  mapping  AF  types  to  AK  types  is  given 
in  Figure  2.  The  translation  on  terms  is  given  by  a 
judgment  A;  T  hF  eF  :  r  j;cps  where  A;  T  bF  eF  :  t  is 
a  derivable  AF  typing  judgment,  and  vops  is  a  AK  value 
with  type  ((/C[r|)  — ►  void)  — *■  void. 

Following  Danvy  and  Filinski  [12],  our  term  translation 
simultaneously  CPS  converts  the  term,  performs  tail- 
call  optimization,  and  eliminates  administrative  redices 
(see  the  technical  report  [31]  for  details).  When  applied 
to  the  factorial  example,  this  translation  yields  the  fol¬ 
lowing  AK  term: 

( fix  f  []  (n:int,  k:(  int)  — *  void). 
ifO(n,  Ar[](  1 ), 

let  x  =  n  —  1  in 
f[](x,fix.[]  (; y.int ). 

let  z  =  n  x  y 
in  fc[](z)))) 

[]  (6,  fix  _  []  (n:int).  halt[int\n) 


3  Simplified  Polymorphic  Closure  Conversion 

The  second  compilation  stage  is  closure  conversion, 
which  separates  program  code  from  data.  This  is  done 
in  two  steps.  Most  of  the  work  is  done  in  the  first  step, 
closure  conversion  proper,  which  rewrites  all  functions 
so  that  they  are  closed.  In  order  to  do  this,  any  variables 
from  the  context  that  are  used  in  the  function  must  be 
taken  as  additional  arguments.  These  additional  argu¬ 
ments  are  collected  in  an  environment,  which  is  paired 
with  the  (now  closed)  code  to  make  a  closure.  In  the 
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second  step,  hoisting,  closed  function  code  is  lifted  to 
the  top  of  the  program,  achieving  the  desired  separa¬ 
tion  between  code  and  data.  Here  we  discuss  closure 
conversion  proper;  the  hoisting  step  is  elementary  and 
is  discussed  briefly  at  the  end  of  the  section. 

Our  approach  to  typed  closure  conversion  is  based  on 
that  of  Minamide  et  al.  [26]:  If  two  functions  with 
the  same  type  but  different  free  variables  (and  there¬ 
fore  different  environment  types)  were  naively  closure 
converted,  the  types  of  their  closures  would  not  be  the 
same.  To  prevent  this,  closures  are  given  existential 
types  [27]  where  the  type  of  the  environment  is  held 
abstract. 

However,  we  propose  an  approach  to  polymorphic  clo¬ 
sure  conversion  that  is  considerably  simpler  than  that 
of  Minamide  et  al.,  which  requires  both  abstract  kinds 
and  translucent  types.  Both  of  these  mechanisms  arise 
because  Minamide  et  al.  desire  a  type-passing  interpre¬ 
tation  of  polymorphism  where  types  are  constructed  and 
passed  to  polymorphic  functions  at  run-time.  Under  a 
type-passing  interpretation,  polymorphic  instantiation 
cannot  easily  be  treated  via  substitution,  as  this  re¬ 
quires  making  a  copy  of  the  code  at  run-time.  Instead, 
a  closure  is  constructed  that  consists  of  closed  code,  a 
value  environment  mapping  variables  to  values,  and  a 
type  environment  mapping  type  variables  to  types. 

In  our  approach,  we  assume  a  type-erasure  interpre¬ 
tation  of  polymorphism  as  in  The  Definition  of  Stan¬ 
dard  AIL  [25],  and  polymorphic  instantiation  is  seman¬ 
tically  handled  via  substitution  (be.,  making  a  copy  of 
the  code  with  the  types  substituted  for  the  type  vari¬ 
ables).  As  we  will  ultimately  erase  the  types  on  terms 
before  execution,  the  “copies”  can  (and  will)  be  rep¬ 
resented  by  the  same  term.  This  avoids  the  need  for 
abstract  kinds  (since  there  are  no  type  environments), 
as  well  as  translucent  types.  A  type-erasure  interpreta¬ 
tion  is  not  without  its  costs:  It  precludes  some  advanced 
implementation  techniques  [28,  43,  1,  29]  and  has  sub¬ 
tle  interactions  with  side-effects.  We  address  the  latter 
concern  by  forcing  polymorphic  abstractions  to  be  val¬ 
ues  [42,  49]  (be.,  they  must  be  syntactically  attached  to 
value  abstractions). 

To  support  this  interpretation,  we  consider  the  partial 
application  of  functions  to  type  arguments  to  be  values. 
For  example,  suppose  v  has  the  type  V[<5,  /3].(r)  — >  void 
where  the  type  variables  a  are  intended  for  the  type 
environment  and  the  type  variables  ft  are  intended  for 
the  function’s  type  arguments.  If  d  is  a  vector  of  types 
to  be  used  for  the  type  environment,  then  the  partial 
instantiation  v[S]  is  still  treated  as  a  value  and  has  type 
V[/?].(  r\B  /  a] )  — >  void.  The  syntax  of  AG  is  otherwise 
similar  to  A 1  : 

types  r  ::=  a  \  int  |  V[5].(r)  — ♦  void  |  (f}  |  3 a.r 

terms  e  ::=  v(v )  |  ifO(v,e i,e2)  |  halt[r]v  \ 

let  x  =  v  in  e  \ 
let  x  =  in  e  \ 

let  x  =  v\  p  V2  in  e  \ 
let  \a,x]  =  unpack  v  in  e 

values  v  ::=  x  |  i  |  ( v }  |  ®[r]  |  pack  [r,  v]  as  3 a.r1  \ 
fixcode  x[a](xi:Ti, .  .  .  ,  Xk-Tk).e 


Our  closure  conversion  technique  is  formalized  as  a 
type-directed  translation  in  the  companion  technical  re¬ 
port  [31].  We  summarize  here  by  giving  the  type  trans¬ 
lation  for  function  types: 

C[V[a].(n,  ...,rk)->  void] 

def 

30.<V[3].(/?,C[ri],  •  •  -,CIthJ)  -*■  void,  ft) 

The  existentially-quantified  variable  ft  is  the  type  of  the 
value  environment  for  the  closure.  The  closure  itself  is 
a  pair  consisting  of  a  piece  of  code  that  is  instantiated 
with  the  type  environment,  and  the  value  environment. 
The  instantiated  code  takes  as  arguments  the  type  argu¬ 
ments  and  value  arguments  of  the  original  abstraction, 
as  well  as  the  value  environment  of  the  closure.  The 
rules  for  closure  converting  AK  abstractions  and  appli¬ 
cations  are  given  in  Figure  3. 

After  closure  conversion,  all  functions  are  closed  and 
may  be  hoisted  out  to  the  top  level  without  difficulty. 
After  hoisting,  programs  belong  to  a  calculus  that  is 
similar  to  Ac  except  that  fixcode  is  no  longer  a  value. 
Rather,  code  must  be  referred  to  by  labels  (£)  and  the 
syntax  of  programs  includes  a  letrec  prefix,  which  binds 
labels  to  code.  A  closure  converted  and  hoisted  fac¬ 
torial,  as  it  might  appear  after  some  optimization,  is 
shown  in  Figure  4. 


4  Explicit  Allocation 

The  Ac  intermediate  language  still  has  an  atomic  con¬ 
structor  for  forming  tuples,  but  machines  must  allocate 
space  for  a  tuple  and  fill  it  out  field  by  field;  the  alloca¬ 
tion  stage  makes  this  process  explicit.  The  syntax  of  AA, 
the  target  calculus  of  this  stage,  is  similar  to  that  of  Ac, 
and  appears  in  Figure  5.  Note  that  there  is  no  longer 
a  value  form  for  tuples.  The  creation  of  an  ^-element 
tuple  becomes  a  computation  that  is  separated  into  an 
allocation  step  and  n  initialization  steps.  For  example, 
if  vo  and  V\  are  integers,  the  pair  (vo,Vi)  is  created  as 
follows  (where  types  have  been  added  for  clarity): 

let  xo'.(int° ,  int0}  =  malloc[int,  int] 
xi'^int1 ,  int0}  =  xo[0]  <—  vo 
x  -.{int1,  int1}  =  Si[l]  <—  Vi 


The  uxo  =  malloc[int,  int\”  step  allocates  an  uninitial¬ 
ized  tuple  and  binds  the  address  (be.,  label)  of  the  tu¬ 
ple  to  xo .  The  “0”  superscripts  on  the  types  of  the 
fields  indicate  that  the  fields  are  uninitialized,  and  hence 
no  projection  may  be  performed  on  those  fields.  The 
“xi  =  xo[0]  <—  vo ”  step  updates  the  first  field  of  the  tu¬ 
ple  with  the  value  vo  and  binds  the  address  of  the  tuple 
to  X\.  Note  that  iq  is  assigned  a  type  where  the  first 
field  has  a  “1”  superscript,  indicating  that  this  field  is 
initialized.  Finally,  the  “x  =  n[l]  *—  »i”  step  initial¬ 
izes  the  second  field  of  the  tuple  with  V\  and  binds  the 
address  of  the  tuple  to  x,  which  is  assigned  the  fully 
initialized  type  (int1,  int1}.  Hence,  both  wo  and  7Ti  are 
allowed  on  x. 
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A  (-K  t,  r  =  A  =  {/3}  A[<5];  r[x:Tcode,  x1  :Ti , . .  . ,  x„:Tn]  (-K  e  :  void-^*  if 

- — -  (abs) 

A,  r  H  k  fix  -r[Gf]  ( 37^  :Tr  ,  .  •  .  ,  Xn  •  Xji, )  ■  e  •  7code  pack  [^envi  {^code  [/']  j^eav)]  as  C  [[r,.0  del 

V[a].(T!, . .  .  ,Tn )  — >  void 

(C[T'],---,C[r^]I) 

fix COtLt  37code  [P >  Cn]  (a7env  *Tenv  5  X\  :C  .  .  ■  ,  Xn  -d  [[T,e  ]  )  - 

lei  x  =  pack  [renv,  (icodelfl.^env)]  as  C[rcode]  in 
let  yi  =  7T 1  (lenv)  in 


let  ym  —  n TrL  ( .?v  n v )  in  ' 

A;  r  t- K  v  :  V[a].(ri , . . .  , rn)  — e  void  v1  A  FK  <7 
A;T  hK  '«}  :  v(  ■■■  A  -,T  \~k  vn  ■.  xn{a / a]  v‘n 

A;  T  h K  v[ff](vi , . .  . ,  vn)  :  t  e 

where  e  —  let  [aenv,:r]  =  unpack  v'  in 
let  ;rcode  =  'Kl(x)  in 
let  renv  =  7T2  (x)  in 
^codePPlK^env.-dJ,  •••  ,v’n) 


Figure  3:  Closure  Conversion  for  AK  Abstractions  and  Applications 


letrec  Ij  =  (*  main  factorial  code  block  *) 
code[](enu:(),  n.int,  k:Tk). 

ifO(n ,  (*  true  branch:  continue  with  1  *) 
let  [P,  kunpack ]  =  unpack  k  in 
let  kco([e  =  7T0  {kunpack)  * n 
let  kenv  —  ^1  {^unpack ) 
in 

kcode{k  env j l)5 

(*  false  branch:  recurse  with  n  —  1  *) 
let  x  =  n  —  1  in 

(*  compute  factorial  of  n  —  I  and  continue  to  k  *) 
£ j (env i.%,  pack  [(int,Tk),  {£ cont ,  (n ,  k}}]  as  7>)  I 
tcont  =  (*  code  block  for  continuation  after  factorial  computation  *) 
code[](enu:(*nf,  rp),  y.int). 

(*  open  the  environment  *) 

let  n  =  7r o (  env)  in 

let  k  =  ■Ki(env)  in 

(*  continue  with  n  x  y  *) 

let  z  =  n  x  y  in 

let  [P,  kunpack ]  =  unpack  k  in 

let  kcode  =  T^o{kunpack)  in 

let  kenv  =  ^lij^unpack) 

in 

kcode{k  env j  Z ) 

lhalt  =  (*  code  block  for  top-level  continuation  *) 
code(enu:(),  n:int).  halt[int](n) 
in 

£f({),  6,  pack  [(},  ( lhalt ,  (}}]  as  rk) 
where  rp  is  3 a. ((a,  inti)  — ►  void,  a) 


where  Tcode  = 
^env  — 
uenv  — 
^code  — 


Figure  4:  Closure  Converted,  Hoisted  Factorial  Code 


types  t 

initialization  flags  ip 

terms  e 

declarations  d 

values  v 

blocks  b 

programs  P 


a  |  int  |  V[<5].(r)  — ►  void  |  (rif1 ,  .  . .  ,  rj’"}  \  3a. t 

0  I  \ 

let  d  in  v(v )  |  let  d  in  ifO(v,  ei ,  e2  )  |  let  d  in  halt[r]v 
x  =  v  |  x  =  7Tj(n)  |  x  =  v\  p  v2  |  [a,l]  =  unpack  v  \ 
x  =  malloc\f]  |  x  =  ®[x]  <—  v' 
x  |  t  |  i  |  v[t\  |  pack  [t,  e]  as  3a.r 
l  =  code[(5](xi:n,  .  .  .  ,  xk\Tk).e 
letrec  b  in  e 


Figure  5:  Syntax  of  AA 


Like  all  the  intermediate  languages  of  our  compiler,  this 
code  sequence  need  not  be  atomic;  it  may  be  rearranged 
or  optimized  in  any  well-typed  manner.  The  initializa¬ 
tion  flags  on  the  types  ensure  that  we  cannot  project 
a  field  unless  it  has  been  initialized.  Furthermore,  our 
syntactic  value  restriction  ensures  there  is  no  unsound¬ 
ness  in  the  presence  of  polymorphism.  However,  it  is 
important  to  note  that  we  interpret  x[i\  *—  v  as  an  im¬ 
perative  operation,  and  thus  at  the  end  of  the  sequence, 
xo,  x  \ ,  and  x  are  all  aliases  for  the  same  location,  even 
though  they  have  different  (but  compatible)  types.  Con¬ 
sequently,  the  initialization  flags  do  not  prevent  a  field 
from  being  initialized  twice.  It  is  possible  to  use  mon¬ 
ads  [44,  21]  or  linear  types  [17,  45,  46]  to  ensure  that 
a  tuple  is  initialized  exactly  once,  but  we  have  avoided 
these  approaches  in  the  interest  of  a  simpler  type  sys¬ 
tem. 

The  type  translation  from  Ac  to  AA  is  trivial.  All  that 
happens  is  that  initialization  flags  are  added  to  each 
field  of  tuple  types: 

-4[<n,...,r0}]  =  ^H1,...,^]1} 

The  term  translation  is  also  straightforward.  As  men¬ 
tioned  above,  tuple  values  are  exploded  into  a  sequence 
of  declarations  consisting  of  a  malloc  and  appropriate 
initializations. 


5  Typed  Assembly  Language 

The  final  compilation  stage,  code  generation,  converts 
AA  to  TAL.  All  of  the  major  typing  constructs  in 
TAL  are  present  in  AA  and,  indeed,  code  generation  is 
largely  syntactic.  To  summarize  the  type  structure  at 
this  point,  we  have  a  combined  abstraction  mechanism 
that  may  simultaneously  abstract  a  type  environment, 
a  set  of  type  arguments,  and  a  set  of  value  arguments. 
Values  of  these  types  may  be  partially  applied  to  type 
environments  and  remain  values.  We  have  existential 
types  to  support  closures  and  other  data  abstractions. 
Finally,  we  have  n-tuples  with  flags  on  the  fields  indi¬ 
cating  whether  the  field  has  been  initialized. 

In  the  remainder  of  this  section,  we  present  the  syntax 
of  TAL  (Section  5.1),  its  dynamic  semantics  (Section 
5.2),  and  its  full  static  semantics  (Section  5.3).  Finally, 
we  sketch  the  translation  from  AA  to  TAL  (Section  5.4). 


5.1  TAL  Syntax 

A  key  technical  distinction  between  AA  and  TAL  is  that 
AA  uses  alpha-varying  variables,  whereas  TAL  uses  reg¬ 
ister  names,  which  like  labels  on  records,  do  not  alpha- 
varv.2  Hence,  some  register  calling  convention  must 
be  used  in  code  generation,  and  the  calling  convention 
needs  to  be  made  explicit  in  the  types.  Following  stan¬ 
dard  practice,  we  assume  an  infinite  supply  of  registers. 
Mapping  to  a  language  with  a  finite  number  of  registers 
may  be  performed  by  spilling  registers  into  a  tuple,  and 
reloading  values  from  this  tuple  when  necessary. 

The  types  of  TAL  include  type  variables,  integers,  exis- 
tentials,  and  tuple  types  augmented  with  initialization 
flags,  as  in  AA.  The  type  V[5]{rl:ri,  . .  .  ,  rr i:r„}  is  used 
to  describe  entry  points  of  basic  blocks  (i.e.,  code  la¬ 
bels)  and  is  the  TAL  analog  of  the  AA  function  type, 
V[a].(n, , .  .  ,  Tv,)  — i ►  void.  The  key  difference  is  that  we 
assign  fixed  registers  to  the  arguments  of  the  code.  Intu¬ 
itively,  to  jump  to  a  block  of  code  of  this  type,  the  type 
variables  a  must  be  suitably  instantiated,  and  registers 
rl  through  r n  must  contain  values  of  type  n  through 
t„,  respectively. 

Another  technical  point  is  that  registers  may  contain 
only  word  values,  which  are  integers,  pointers  into  the 
heap  (i.e.,  labels),  and  instantiated  or  packed  word  val¬ 
ues.  Tuples  and  code  blocks  are  large  values  and  must 
be  heap  allocated.  In  this  manner,  TAL  makes  the  lay¬ 
out  of  data  in  memory  explicit. 

With  these  technical  points  in  mind,  we  present  the  full 
syntax  of  TAL  in  Figure  6.  A  TAL  abstract  machine  or 
program  consists  of  a  heap,  a  register  file  and  a  sequence 
of  instructions.  The  heap  is  a  mapping  of  labels  to  heap 
values,  which  are  tuples  and  code.  The  register  file  is  a 
mapping  of  registers  (ranged  over  by  the  metavariable  r ) 
to  word  values.  Although  heap  values  are  not  word  val¬ 
ues,  the  labels  that  point  to  them  are.  The  other  word 
values  are  integers,  instantiations  of  word  values,  exis¬ 
tential  packages,  and  junk  values  (It),  which  are  used 
by  the  operational  semantics  to  represent  uninitialized 
data.  A  small  value  is  either  a  word  value,  a  register, 
or  an  instantiated  or  packed  small  value;  this  distinc¬ 
tion  is  drawn  because  a  register  must  contain  a  word, 
not  another  register.  Code  blocks  are  linear  sequences 

2 Indeed,  the  register  file  may  be  viewed  as  a  record,  and  reg¬ 
ister  names  as  field  labels  for  this  record. 
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of  instructions  that  abstract  a  set  of  type  variables,  and 
state  their  register  assumptions.  The  sequence  of  in¬ 
structions  is  always  terminated  by  a  jmp  or  halt  instruc¬ 
tion.  Expressions  that  differ  only  by  alpha- variation  are 
considered  identical,  as  are  programs  that  differ  only  in 
the  order  of  fields  in  a  heap  or  register  file. 


5.2  TAL  Operational  Semantics 

The  operational  semantics  of  TAL  is  presented  in  Fig¬ 
ure  7  as  a  deterministic  rewriting  system  P  i — *  P'  that 
maps  programs  to  programs.  Although,  as  discussed 
above,  we  ultimately  intend  a  type-erasure  interpreta¬ 
tion,  we  do  not  erase  the  types  from  the  operational  se¬ 
mantics  presented  here,  so  that  we  may  more  easily  state 
and  prove  a  subject  reduction  theorem  (Lemma  5.1). 

If  we  erase  the  types  from  the  instructions,  then  their 
meaning  is  intuitively  clear  and  there  is  a  one-to-one 
correspondence  with  conventional  assembly  language  in¬ 
structions.  The  two  exceptions  to  this  are  the  unpack 
and  malloc  instructions,  which  are  discussed  below. 
The  well-formed  terminal  configurations  of  the  rewrit¬ 
ing  system  have  the  form  (H,R{ rl  i— *■  tc},  halt[r] ). 
This  corresponds  to  a  machine  state  where  the  regis¬ 
ter  rl  contains  the  value  computed  by  the  computation. 
All  other  terminal  configurations  are  considered  to  be 
“stuck”  programs. 

Intuitively,  jmp  v,  where  v  is  a  value  of  the  form  £[t\, 
transfers  control  to  the  code  bound  to  the  label  l.  in¬ 
stantiating  the  abstracted  type  variables  of  £  with  t. 
The  ldrd,rs[x]  instruction  loads  the  component  of 
the  tuple  bound  to  the  label  in  r „ ,  and  places  this  word 
value  in  rd-  Conversely,  sto  rd[t],r,  places  the  word 
value  in  rs  at  the  i1^  position  in  the  tuple  bound  to  the 
label  in  rd-  The  bnz  r.  v  instruction  tests  the  value  in  r 
to  see  if  it  is  zero.  If  so,  then  control  continues  with  the 
next  instruction.  Otherwise  control  is  transferred  to  v 
as  with  the  jmp  instruction. 

The  instruction  unpack  [a,Td],v,  where  v  is  a  value  of 
the  form  pack  [r ,  v']  as  t,  is  evaluated  by  substituting 
t'  for  a  in  the  remainder  of  the  sequence  of  instructions 
currently  being  executed,  and  by  binding  the  register  r< j 
to  the  value  v  .  If  types  are  erased,  the  unpack  instruc¬ 
tion  can  be  implemented  by  using  a  mov  instruction. 

As  at  the  AA  level,  malloc  Td[ri,  ...,  Tn]  allocates  a 
fresh,  uninitialized  tuple  in  the  heap  and  binds  the  ad¬ 
dress  of  this  tuple  to  rd-  Of  course,  real  machines  do  not 
provide  a  primitive  malloc  instruction.  Our  intention 
is  that,  as  types  are  erased,  malloc  is  expanded  into  a 
fixed  instruction  sequence  that  allocates  a  tuple  of  the 
appropriate  size.  Because  this  instruction  sequence  is 
abstract,  it  prevents  optimization  from  re-ordering  and 
interleaving  these  underlying  instructions  with  the  sur¬ 
rounding  TAL  code.  However,  this  is  the  only  instruc¬ 
tion  sequence  that  is  abstract  in  TAL. 

Real  machines  also  have  a  finite  amount  of  heap  space. 
It  is  straightforward  to  link  our  TAL  to  a  conservative 


garbage  collector  [8]  in  order  to  reclaim  unused  heap 
values.  Support  for  an  accurate  collector  would  require 
introducing  tags  so  that  we  may  distinguish  pointers 
from  integers,  or  else  require  a  type-passing  interpreta¬ 
tion  [43,  29].  The  tagging  approach  is  readily  accom¬ 
plished  in  our  framework. 


5.3  TAL  Static  Semantics 

The  static  semantics  for  TAL  appears  in  Figures  9  and 
10  and  consists  of  thirteen  judgments,  summarized  in 
Figure  8.  The  static  semantics  is  inspired  by  and  follows 
the  conventions  of  Morrisett  and  Harper’s  A~v  [29].  A 
weak  notion  of  subtyping  is  included  for  technical  rea¬ 
sons  related  to  subject  reduction,  so  that  an  initialized 
tuple  may  still  be  given  its  old  uninitialized  type  (see 
the  technical  report  [31]  for  details). 

Lemma  5.1  (Subject  Reduction)  If  FTal  P  and 
P  i - *  P'  then  Ftal  P‘ ■ 

Lemma  5.2  (Progress)  If  FTal  P  then  either: 

1.  there  exists  P'  such  that  P  i — *  P' ,  or 

2.  P  is  of  the  form  (H,R{t1  t—  w},halt[r])  where 
there  exists  T  such  that  (~tal  H  :  T  and  T;  0  Ftal 
w  :  t. 


Corollary  5.3  (Type  Soundness)  If  HTal  P,  then 
there  is  no  stuck  P'  such  that  P  i - P' . 


5.4  Code  Generation 

The  type  translation,  T[  •  ],  from  AA  to  TAL  is  straight¬ 
forward.  The  only  point  of  interest  is  the  translation  of 
V  types,  which  must  assign  registers  to  value  arguments: 

T[V[a](n,  ■  •  •  ,  7 -„)  -+  void] 

def 

V[o]{rl:T[[ri],  .  . .  ,  r n:T[r„]} 

The  term  translation  is  also  straightforward,  except  that 
we  must  keep  track  of  the  register  to  which  a  variable 
maps,  as  well  as  the  registers  used  thus  far  so  that  we 
may  allocate  fresh  registers.  When  translating  a  block 
of  code,  we  assume  that  registers  rl  through  r n  contain 
the  value  arguments.  We  informally  summarize  the  rest 
of  the  translation  as  follows: 

•  x  =  v  is  mapped  to  mov  rj,®. 

•  x  =  7 r;(«)  is  mapped  to  the  sequence: 

mov  tx,v\  ldi-,,ri[(] 
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types 

r  : 

:  = 

a  \  int  |  V[<5].r  |  (rf1 , .  .  .  ,  }  |  3 a.r 

initialization  flags 

ip  : 

:  = 

0  I  1 

heap  types 

T  : 

:  = 

{£i:n,  .  . .  ,  ln:T„] 

register  file  types 

T  : 

:  = 

{ti:t i,  .  .  .  ,  r „ : r„ } 

type  contexts 

A  : 

= 

a 

registers 

r 

£ 

{rl,  r2,  r3,  ■  ■  ■} 

word  values 

w 

:  = 

£  |  i  |  It  |  m[r]  |  pack  [r,  m]  as  r' 

small  values 

V 

:  = 

r  \  w  |  v[t\  |  pack  [r,  ®]  as  t' 

heap  values 

h 

:  = 

,  .  .  .  ,  1  COde[(J]r.S 

heaps 

H 

:  = 

{  f  |  1  *  hi  ,  .  .  .  ,  jj,  1  *  /?  n  } 

register  files 

R 

:  = 

m„} 

instructions 

i 

:  = 

add  r<j,  r5,  a  |  bnz  r,  d  |  Id  rd,  r s  [*]  | 

malloc  Td[f]  |  mov  rd,  v  |  mul  rd,  rs 
sto  Td[i],  rs  |  sub  rd,rs,v  |  unpack 

instruction  sequences 

S 

:  = 

t;  S  |  jmp  v  |  haltfr] 

programs 

P  : 

:  = 

( H ,  R,  S ) 

Figure  6:  Syntax  of  TAL 


|  (H,  R,  S)  i - *  P  where 

if  5  = 

then  P  = 

add  rd,rs,  v,  S' 

(H,  R{rd  R(r s)  +  R{v)},  S') 

and  similarly  for  mul  and  sub 

( H,R ,  S') 

bnz  r,  v\  S' 

when  R(r)  =  i  and  i  0 

(H,R,  S"[f/a\) 
where  R{v)  =  £)■?] 
and  H{£)  =  code[d']^.5,, 

jmp  v 

(H,  R,  S'[f/a]) 
where  R(v)  =  £[f\ 
and  H(£)  =  code[d']^.5, 

Id  rd,  r s [ij ;  S' 

( H ,  R{rd  i-~  w,},  S') 
where  R(re)  =  £ 

and  H{£)  =  (m  o,  .  .  .  ,  m„_i)  with  0  <i<n 

malloc  rd[n,  .  .  .  ,  Tn J;  S' 

(H{£  i-»  (?n,  . . . ,  R{rd  i->  £\,  S') 

where  £  0  H 

mov  rd,  v,  S' 

( H,R{rd  ^  K(»)},S') 

sto 

(H{£  i->  (mo, ,  w;_i,  R(tb),  wt+1, . .  . ,  R ,  5') 

where  R(rd)  =  £ 

and  if(£)  =  (mo,  .  .  .  ,  m„_i)  with  0  <  i  <  n 

(H,  R{rd  i->  m),  5'[r/aJ) 
where  R(v)  =  pac&  [r,  m]  as  t' 

Where  R(v ) 


R(r )  when  v  =  r 

w  when  v  =  m 

R(v')[t]  when  v  =  v'[r\ 

pack  [r,  R(v')]  as  r'  when  v  =  pack  [r,  i/]  as  r' 


Figure  7:  Operational  Semantics  of  TAL 
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Judgment 

Meaning 

A  (“TAL  T 

t  is  a  valid  type 

H  TAL  ^ 

f  is  a  valid  heap  type 

(no  context  is  used  because  heap  types  must  be  closed) 

A  1“  TAL  1 

r  is  a  valid  register  hie  type 

A  I-tal  n  <  r2 

7"i  is  a  subtype  of  t2 

A  h  tal  1 1  C  T2 

the  register  hie  Ti  weakens  T2 

t-  TAL  H  •  ^ 

the  heap  H  has  type  T 

h tal  R  •  r 

the  register  hie  R  has  type  T 

’F  Ftal  h  :  r 

the  heap  value  h  has  type  r 

T;  A  Ftal  tr  :  r 

the  word  value  w  has  type  t 

either  the  word  value  w  has  type  r 
or  to  is  ?r  and  tp  is  0 

T;  A;  T  FTal  »  :  t 

the  small  value  v  has  type  r 

T;  A;T  Ftal  5 

S  is  a  valid  sequence  of  instructions 

FTal  (H,  R,  S) 

( H,  R,  S )  is  a  valid  program 

Figure  8:  TAL  Static  Semantic  Judgments 


•  x  =  Vi  p  v2  is  mapped  to  the  sequence: 

mov  tx ,  ®i  ;  ar ith  rx,rx,  v2 

where  arithis  the  appropriate  arithmetic  instruc¬ 
tion. 

•  \a,  x]  =  unpack  v  is  mapped  to  unpack  [a,  rx],  v. 

•  x  =  malloc[r ]  is  mapped  to  malloc  r^f]. 

•  x  =  e[x]  ♦—  v  is  mapped  to  the  sequence: 

mov  rx,  v  ;  mov  rt,mp,  v' ;  sto  rx[i\,rt 

emp 

•  v(vi, .  .  .  ,  vn)  is  mapped  to  the  sequence: 

mov  rtempj ,  Vi  ;  .  .  .  ;  mov  rtemP„  ,  vn  ; 

mov  rl,  rtemPl  ;  •  •  •  ;  mov  r n,  rtemP„  ;  jmp  v 

•  ifO(v,e i ,  62 )  is  mapped  to  the  sequence: 

mov  rtemp,  v  ;  bnz  ;  Si 

where  £  is  bound  in  the  heap  to  code[<J]r.S2 ,  the 
translation  of  a  is  Si,  the  free  type  variables  of  e2 
are  contained  in  a,  and  T  is  the  register  file  type 
corresponding  to  the  free  variables  of  e2. 

•  halt[r]v  is  mapped  to  the  sequence: 

mov  rl,  v  ;  halt[r] 

Figure  11  gives  a  TAL  representation  of  the  factorial 
computation. 

The  CPS,  closure  conversion,  allocation,  and  code  gen¬ 
eration  translations  each  take  a  well-typed  source  term, 
and  produce  a  well-typed  target  term.  Hence,  the  com¬ 
position  of  these  translations  is  a  type-preserving  com¬ 
piler  that  takes  well-typed  AF  terms,  and  produces  well- 
typed  TAL  code.  The  soundness  of  the  TAL  type  sys¬ 
tem  (Corollary  5.3)  ensures  that  the  resulting  code  will 
either  diverge  or  produce  a  TAL  value  of  the  appropriate 
type. 


6  Extensions  and  Practice 


We  claim  that  the  framework  presented  here  is  a  practi¬ 
cal  approach  to  compilation.  To  substantiate  this  claim, 
we  are  constructing  a  compiler  called  TALC  that  maps 
the  KML  programming  language  [10]  to  a  variant  of 
the  TAL  described  here,  suitably  adapted  for  the  Intel 
x86  family  of  processors.  We  have  found  it  straight¬ 
forward  to  enrich  the  target  language  type  system  to 
include  support  for  other  type  constructors,  such  as  ref¬ 
erences,  higher-order  constructors,  and  recursive  types. 
We  omitted  discussion  of  these  features  here  in  order  to 
simplify  the  presentation. 

Although  this  paper  describes  a  CPS-based  compiler, 
we  opted  to  use  a  stack-based  compilation  model  in  the 
TALC  compiler.  Space  considerations  preclude  a  com¬ 
plete  discussion  of  the  details  needed  to  support  stacks, 
but  the  primary  mechanisms  are  as  follows:  The  size 
of  the  stack  and  the  types  of  its  contents  are  specified 
by  stack  types,  and  code  blocks  indicate  stack  types  de¬ 
scribing  the  state  of  the  stack  they  expect.  Since  code 
is  typically  expected  to  work  with  stacks  of  varying  size, 
functions  may  quantify  over  stack  type  variables,  result¬ 
ing  in  stack  polymorphism. 

Efficient  support  for  disjoint  sums  and  arrays  also  re¬ 
quires  considerable  additions  to  the  type  system.  For 
sums,  the  critical  issue  is  making  the  projection  and 
testing  of  tags  explicit.  In  a  naive  implementation,  the 
connection  between  a  sum  and  its  tag  is  forgotten  once 
the  tag  is  loaded.  For  arrays  the  issue  is  that  the  index 
for  a  subscript  or  update  operation  must  be  checked  to 
see  that  it  is  in  bounds.  Exposing  the  bounds  check  ei¬ 
ther  requires  a  fixed  code  sequence,  thereby  constraining 
optimization,  or  else  the  type  system  must  be  strength¬ 
ened  so  that  some  (decidable)  fragment  of  arithmetic 
can  be  encoded  in  the  types.  Sums  may  also  be  im¬ 
plemented  with  either  of  the  above  techniques,  or  by 
using  abstract  types  to  tie  sums  to  their  tags.  In  the 
TALC  compiler,  in  order  to  retain  a  simple  type  system 
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and  economical  typechecking,  we  have  initially  opted  for 
fixed  code  sequences  but  are  exploring  the  implications 
of  the  more  complicated  type  systems. 

Finally,  since  we  chose  a  type-erasure  interpretation  of 
polymorphism,  adding  floats  to  the  language  requires  a 
boxing  translation.  However,  recent  work  by  Leroy  [23] 
suggests  that  it  is  only  important  to  unbox  floats  in  ar¬ 
rays  and  within  compilation  units,  which  is  easily  done 
in  our  framework. 

7  Summary 

We  have  given  a  compiler  from  System  F  to  a  statically 
typed  assembly  language.  The  type  system  for  the  as¬ 
sembly  language  ensures  that  source  level  abstractions 
such  as  closures  and  polymorphic  functions  are  enforced 
at  the  machine-code  level.  Furthermore,  the  type  sys¬ 
tem  does  not  preclude  aggressive  low-level  optimization, 
such  as  register  allocation,  instruction  selection,  or  in¬ 
struction  scheduling.  In  fact,  programmers  concerned 
with  efficiency  can  hand-code  routines  in  assembly,  as 
long  as  the  resulting  code  typechecks.  Consequently, 
TAL  provides  a  foundation  for  high-performance  com¬ 
puting  in  environments  where  untrusted  code  must  be 
checked  for  safety  before  being  executed. 
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Figure  9:  Static  Semantics  of  TAL  (except  instructions) 
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Figure  10:  Static  Semantics  of  TAL  instructions 
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fact  =  (if,  {},S)  where 

H  = 

l_f  act : 

code[]{rl :  (}  ,r2  :  int  ,r3 :  r*}  . 
bnz  r2,l_nonzero 
unpack  [a,  r3],r3 
Id  r4,r3[0] 

Id  rl  ,r3  [1] 
mov  r2 , 1 
jmp  r4 
l_nonzero : 

code[]{rl :  (}  ,r2  :  int  ,r3 :  r*}  . 
sub  r6 ,r2 , 1 
malloc  r4[rai,Tjfc] 
sto  r4[0]  ,r2 
sto  r4[l]  ,r3 

malloc  t}  },r2:/n/},  (int1 ,  45}}] 

mov  r5,l_cont 
sto  r3  [0]  ,r5 
sto  r3  [1]  ,r4 
mov  r2,r6 

mov  t3  , pack  [{int1 ,  rl)  ,t3]  as  T). 
jmp  l_fact 
l_cont : 

code[]{rl :  {int1  ,rj)  ,r2  :  int }  . 

Id  r3  ,rl  [0] 

Id  r4  ,rl  [1] 
mul  r2,r3,r2 
unpack  [ a  ,  r4] ,  r4 
Id  r5  ,r4  [0] 

Id  rl  ,r4[l] 
jmp  r5 
l_halt : 

code[]{rl :  (}  ,r2  :  int}  . 
mov  rl,r2 
halt  [ml] 


7  zero  branch:  call  k  (in  r3)  with  1 
7,  project  k  code 
7  project  k  environment 

%  jump  with  {rl  =  env,  r2  =  1} 

7.  n  —  1 

7  create  environment  for  cont  in  r4 
7  store  n  into  environment 
7  store  k  into  environment 
7  create  cont  closure  in  r3 

7  store  cont  code 
7  store  environment  (n,k) 

7  arg  :=  n  —  1 

7  abstract  the  type  of  the  environment 
7  jump  to  k  with  {rl  =  env,  r2  =  n  —  1,  r3 

7  r2  contains  (n  —  1)! 

7  retrieve  n 
7  retrieve  k 
7  n  x  (n  —  1)! 

7  unpack  k 
7  project  k  code 
7  project  k  environment 
7  jump  to  k  with  {rl  =  env,  r2  =  «!} 


7  halt  with  result  in  rl 


and  S  = 

malloc  r3[V[].{rl :  (}  ,r2  :  int}  ,  {}] 

mov  rl,l_halt 

sto  r3  [0]  ,rl 

malloc  rl[] 

sto  r3  [1]  ,rl 

mov  r3  ,pack  [(},r3]  as  tu 

mov  r2,6 

jmp  l_fact 

and  Tk  =  3o.{V[].{rl:o,r2:mt}1,  a1) 


7  create  halt  closure  in  r3 


7  create  an  empty  environment  ((}) 

7  store  {}  into  closure,  still  in  rl 
7  abstract  the  type  of  the  environment 
7  load  argument  (6) 

7  begin  factorial  with  {rl  =  (},  r2  =  6,  r 3 


Figure  11:  Typed  Assembly  Code  for  Factorial 


=  cont} 


haltcont } 
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